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Abstract 

We consider the problem of influence maximization , the problem of maximizing 
the number of people that become aware of a product by finding the ‘best’ set 
of ‘seed’ users to expose the product to. Most prior work on this topic assumes 
that we know the probability of each user influencing each other user, or we have 
data that lets us estimate these influences. However, this information is typically 
not initially available or is difficult to obtain. To avoid this assumption, we adopt 
a combinatorial multi-armed bandit paradigm that estimates the influence prob¬ 
abilities as we sequentially try different seed sets. We establish bounds on the 
performance of this procedure under the existing edge-level feedback as well as a 
novel and more realistic node-level feedback. Beyond our theoretical results, we 
describe a practical implementation and experimentally demonstrate its efficiency 
and effectiveness on four real datasets. 

1 Introduction 

Viral marketing aims to leverage a social network to spread awareness about a specific product in 
the market through information propagation via word-of-mouth. Specifically, the marketer aims to 
select a fixed number of ‘influential’ users (called seeds ) to give free products or discounts to. The 
marketer assumes that these users will influence their neighbours and, by transitivity, other users in 
the social network. This will result in information propagating through the network as an increasing 
number of users adopt or become aware of the product. The marketer typically has a budget on 
the number of free samples or discounts that can be given, so she must strategically choose seeds 
so that the maximum number of people in the network become aware of the product. The goal is 
to maximize the spread of this influence, and this problem is referred to as influence maximization 

(IM) ELI- 

In their seminal paper, Kempe et al. EH studied the IM problem under well-known probabilis¬ 
tic information diffusion models including the independent cascade (IC) and linear threshold (LT) 
models. While the problem is NP-hard under these models, there have been numerous papers on 
efficient approximations and heuristic algorithms (see Section [2]). But prior work on IM assumes 
that in addition to the network structure, we either know the pairwise user influence probabilities 
or that we have past propagation data from which these probabilities can be learned. However, in 
practice the influence probabilities are often not available or are hard to obtain. To overcome this, 
the initial series of papers following Ell) simply assigned influence probabilities using some heuris¬ 
tic means. However, Goyal et al. ||T9S showed empirically that learning the influence probabilities 
from propagation data is critical to achieving seeds and a spread of high quality. 

In this work, we consider the practical situation where even the propagation data may not be avail¬ 
able. We adopt a combinatorial multi-armed bandit (CMAB) paradigm and consider an IM campaign 
consisting of multiple rounds (as in another recent work IT2l ). Each round amounts to an IM at¬ 
tempt and incurs a regret in the influence spread because of the lack of knowledge of the influence 
probabilities. We seek to minimize the accumulated regret incurred by choosing suboptimal seed 
sets over multiple rounds. A new marketer may begin with no knowledge (other than the graph 
structure) and at each round we can choose seeds that improve our knowledge and/or that lead to a 
large spread, leading to a class exploration-exploitation trade-off. (An alternative to minimizing the 
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regret can be to just learn the influence probabilities in the network as efficiently as possible. This is 
referred to as pure exploration 013 and we briefly explore it in Appendix C.) As in prior work, we 
first consider “edge-level” feedback where we assume we can observe whether influence propagated 
via each edge in the network (Section[3]». However, we also propose a novel “node-level” feedback 
mechanism which is more realistic (Section [4} : it only assumes we can observe whether each node 
became active (e.g., adopted a product) or not, as opposed to knowing who influenced that user. We 
establish bounds on the regret achieved by the algorithms under both kinds of feedback mechanisms. 
We further present regret minimization algorithms (Section|5]i and conduct extensive experiments on 
four real datasets to evaluate the effectiveness of the proposed algorithms (Section[6]l. All proofs ap¬ 
pear in the supplementary part, which also explores the effect of prior on performance and discusses 
the alternative objective of network exploration. 

2 Motivation and Related Work 

Influence Maximization: We model a social network as a probabilistic directed graph G = 
[V. E, P) with nodes V representing users, edges E representing connections/contacts, and edge 
weights P : E —i [0,1], The influence probability P(u, v) represents the probability with which 
user v will perform an action given that it is performed by u. A stochastic diffusion model 1) gov¬ 
erns how information spreads from nodes to their neighbours in the network. Given a seed set S, 
the expected number of nodes of G influenced by S under the model D, denoted <td(S) (just a(S) 
when D is obvious from context), is called the (expected) influence spread of S. Given G and a 
budget k on the number of seeds to be selected, IM aims to find the seed set S of size k which will 
lead to the maximum influence spread cr(S) under D, 

S* = argmaxcr(S). (1) 

\S\<k 

Although IM is NP-hard under standard diffusion models, the expected spread function <jd(S) is 
monotone and submodular. Solving Eq. ((|T]>) thus reduces to maximizing a submodular function 
under a cardinality constraint, a problem that can be solved to within a (1 — l/e)-approximation 
using a greedy algorithm (25l . There have been a variety of extensions including development of 
scalable heuristics, alternative diffusion models, and scalable approximation algorithms ma, m . 
m, no. ed> is, ax We refer the reader to (8j for a detailed survey. Most work on IM 
assumes knowledge of the influence probabilities, but there is a growing body of work on learning 
the influence probabilities from data. Typically, the data is a set of diffusions (also called cascades) 
that happened in the past, specified in the form of a log of actions by network users. Learning 
influence probabilities from available cascades has been used discrete-time models 1127111 81(261 and 
continuous-time models M- However, in many real datasets the cascades are not available. For 
these datasets, we can’t even use these existing approaches for learning the influence probabilities. 

Multi-armed Bandit: The stochastic multi-armed bandit (MAB) paradigm was first proposed 
in l22l . In the traditional framework, there are m arms each of which has an unknown reward 
distribution. The bandit game proceeds in rounds and in every round s, an arm is played and a corre¬ 
sponding reward is generated by sampling the reward distribution for that arm. This game continues 
for a fixed number of rounds T. Our goal is to minimize the regret resulting from playing sub- 
optimal arms across rounds ( regret minimization). This results in a trade-off between exploration 
(sampling arms to learn about them) and exploitation (pulling the arm which we think gives the 
highest expected reward). Auer et al. 0 proposed algorithms which can achieve the optimal regret 
of 0(log(7’)) over T rounds. The combinatorial multiarmed bandit paradigm is an extension where 
we can pull a set of arms (a ‘superarm’) together Gann ns id. The subsequent reward could be 
a linear m or non-linear DU combination of the individual rewards. Gai et al. m and Chen 
et al. [IT) consider a CMAB framework with access to an approximation oracle to find the best 
(super)arm to be played in each round. Gopalan et al. D3 propose a Thompson sampling based 
algorithm for regret minimization. Chen et al. m introduce the notion that triggering superarms 
can also probabilistically trigger other arms. They target both ad placement on web pages and viral 
marketing applications under semi-bandit feedback 0. They propose an algorithm based on the 
upper confidence bound (UCB) called combinatorial UCB (CUCB) for obtaining an optimal regret 
of 0(log(T)). However, they assume the often-unrealistic “edge-level” feedback and did not ex¬ 
perimentally test their algorithm. In contrast, in this work we consider more realistic “node-level” 
feedback and show that our proposed algorithm gives strong empirical performance. More recently. 
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CMAB 

Symbol 

Mapping to IM 

Base arm 

i 

Edge (u, v) 

Reward for arm i in round s 

Xp s 

Status (live / dead) for edge (u, v ) 

Mean of distribution for arm i 

Ti 

Influence probability p(U,v) 

Superarm 

A 

Union of outgoing edges Es from nodes in seed set S 

No. of times i is triggered in s rounds 

Tp s 

No. of times u becomes active in s diffusions 

Reward in round s 

r s 

Spread d in the s tfl IM attempt 


Table 1: Mapping of the CMAB framework to IM 
Lei et al. l23l studied the related, but very different, problem of maximizing the distinct number of 
nodes activated across rounds. However, they assume edge-level feedback, do not establish any qual¬ 
ity guarantees, and do not theoretically compare performance of their approach with that achievable 
when influence probabilities are known. Further, they synthetically assigned “true” probabilities 
while we also test our algorithms on datasets where these probabilities are learned from real data. 

3 CMAB Framework for IM 

3.1 Review of CMAB 

The CMAB framework consists of m base arms. Each arm i is associated with a random variable 
X, H which denotes the outcome or reward of triggering arm i arm on trial s. The reward X l s is 
bounded on the support [0,1] and is independently and identically distributed according to some 
unknown distribution with mean p t . In each of the T rounds, a superarm A (a set of base arms) is 
played, which triggers all arms in A. In addition, some of the other arms may get probabilistically 
triggered. Let p' A denote the triggering probability of arm i if the superarm A is played (observe that 
p' A - 1 for i £ A). The reward obtained in each round s can be a (possibly non-linear) function of 
the rewards X is for each arm i that gets triggered in that round. Let T^ s denote the total number 
of times an arm i has been triggered at round s. For the special case of s = T, we use the notation 
Ti := T-i t- Each time an arm % is triggered, we use the observed reward to update its mean estimate 
p,^ The superarm that is expected to give the highest reward is selected in each round by an oracle (). 
The oracle takes as input the current mean estimates p = (p A ...., /l m ), and outputs an appropriate 
superarm A. In order to accommodate intractable problems, the framework of fTTl IT2ll assumes 
that the oracle provides an [a, ^-approximation to the optimal solution; the oracle outputs with 
probability /3 a superarm A such that it attains an a approximation to the optimal solution. 

3.2 Adaptation to IM 

Though our framework is valid for any discrete time diffusion model, we will assume the IC diffu¬ 
sion model in our discussion. This model uses discrete steps. At time t = 0, only the seed nodes are 
active. Each active node u gets one attempt to influence/activate each of its inactive out-neighbours 
v in the next time step. This activation attempt succeeds with influence probability p u>v := P(u. v). 
An edge along which an activation attempt succeeded is said to be live, and other edges are said to 
be dead. At a given time t, an inactive node v may have multiple parents which activated at time 
(t — 1). This set of parents are capable of activating v at time t and we refer to them as the active 
parents of v at time /. There can be 2^ K (each edge can be live or dead) possible samples (referred 
to as possible worlds in the IM literature) of the probabilistic network G. The sample corresponding 
to the diffusion in the real world is referred to as the “true” possible world and results in a labelling 
of nodes as influenced (active) or not influenced. The actual spread is the number of nodes reachable 
from the selected seed nodes in the true possible world and we denote it by er. Table [I] gives our 
mapping of the various components of the CMAB framework to IM. Note that since each edge can 
be either live or dead in the true diffusion, X l s £ {0,1} and we can assume a Bernoulli distribution 
on these values. We describe the CMAB framework for IM in Algorithm T| In each round, the regret 
minimization algorithm A selects a seed set S with | = k and plays the corresponding superarm 

Es- S can be selected either randomly (EXPLORE) or by solving the IM problem with the current 
influence probability estimates p (EXPLOIT). The details for solving Eq. {l} are encapsulated in 
the oracle O which takes as input the graph G and p, and outputs a seed set S under the cardinality 
constraint k. For the case of IM, O constitutes a (1 - 1,1 — T. (-approximation oracle f!2l . Notice 
that the well-known greedy algorithm used for IM can serve as such an oracle. Once the superarm is 
played, information diffuses in the network and a subset of network edges become live which leads 
to a subset of nodes becoming active. The reward X, ^ for these edges is 1. Note that the reward 
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Algorithm 1: CMAB FRAMEWORK FOR IM(Graph G = ( V., E ), budget k, Feedback mechanism 
M, Algorithm A) 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

<7 (S') is the number of active nodes at the end of the diffusion process and is thus a non-linear func¬ 
tion of the rewards of the triggered arms (edges). After observing a diffusion, the mean estimate /'/ 
vector needs to updated. In this context, the notion of a feedback mechanism M plays an important 
role. It characterizes the information available after a superarm is played. This information is used to 
update the model to improve the mean estimates (UPDATE in Algorithm [l}. Let S* be the solution 
to Eq. (jTJ and let a* = <r(S*), the optimal expected spread. Since IM is NP-hard, even if the true 
influence probabilities are known, we can only hope to achieve an expected spread of a/3c r*, where 
a = 1 — 2 and B =1 We let S s be the seed set chosen by A in round s. The regret incurred 

by A is then defined by 

Reg(n, a , /?) = Tafia* - E s &(S s )] (2) 

S = 1 


Initialize /t; 

Mi initialize X) = 0 ; 

for s = 1 -A T do 

IS-EXPLOIT is a boolean set by algorithm A ; 
if IS-EXPLOIT then 

E s = EX P L OIT (6', /i", O, k) 

else 

E s = EXPLORED,*;) 

Play the superarm Es and observe the diffusion cascade c ; 
A = UPDATE(c,M); 


where the expectation is over the randomness in the seed sets output by the oracle. 

The usual feedback mechanism is the edge-level feedback proposed by tm where we assume that 
we know the status (live or dead?) of each triggered edge in the “true” possible world. The mean 
estimates of the arms distributions can then be updated using Eq. 0 

A = **' 




(3) 


i,t 


4 Node-Level Feedback 


Edge-level feedback is often not realistic because success/failure of activation attempts is not gener¬ 
ally observable. Unlike the status of edges, it is quite realistic and intuitive that we can observe the 
status of each node: did the user buy or adopt the marketed product? While this is a more realistic 
assumption, the disadvantage node-level feedback is that updating the mean estimate for each edge 
is more challenging. This is because we do not know which active parent activated the node, or 
when it was activated. That is, we have a credit assignment problem. Under edge-level feedback, 
we assume that we know the status of each edge ( Uj,v ), l < j < K and use it to update mean 
estimates. Under node-level feedback, any of the active parents may be responsible for activating a 
node v and we don’t know which, leading to a credit assignment problem. We describe two ways to 
resolve this problem. 

4.1 Maximum Likelihood Approach 

An obvious way to infer the edge probabilities given the status of each node in the cascade is to 
use maximum likelihood estimation (MLE). We use an MLE formulation similar to those proposed 
in ESE). These papers describe an offline method for learning influence probabilities, where a 
fixed set of past diffusion cascades is given as input. A diffusion cascade captures how information 
spreads in the network and contains information about if and when each node became active in the 
diffusion. The log-likelihood function for a given set of cascades C is given by: 

c 

log L(p) = EE log Ll(p) (4) 

c= 1 vGV 

where L^{p) models the likelihood of observing the cascade c £ C w.r.t. node v, given the influence 
probability estimates p. Let t' u be the timestep in the diffusion process at which node u becomes 
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active in cascade c. If p UiV is the influence probability of the edge (u, v) and Ni n (v) is the set of 
incoming neighbours of v, log under the IC model can be written as follows: 


log L c v (p) = Y l 11 (1 — Pu,v) + In ( 1 - II (1 ~Pu,v) 


(5) 


uGA c 


uGB c 


Here, A c = {u £ Ni n (v) :££<££ — 2} and B c = {u £ Ni n (v) : t c u = t c v — 1}. The first term 
corresponds to unsuccessful attempts by active parents to activate node v, whereas the second term 
corresponds to the successful activation attempts. Using the transformation 0 UV = — ln(l — p u „), 
Eq- 0 becomes 


log L c v 0) = Y 9u,v + In (1 - exp ( - Y 

uGA' 


9,, 


( 6 ) 


uGB c 


It can be verified that the log-likelihood function given in Eq. is convex. It is also separable across 
nodes and can be minimized independently for each node using methods like gradient descent. In 
our setting, we don’t have a batch of available cascades but generate cascades on the fly. We can 
store the generated cascades and use these to find the maximum likelihood estimator for each node 
in every round of an IM campaign in our bandit framework. As a consequence of observing just 
node statuses, we incur error in the inferred rewards for each arm, which we characterize next. All 
proofs appear in Appendix A. 

Theorem 1. Let p^. v and p^. v resp. denote the probability estimates learned from edge-level 
and node-level feedback using maximum likelihood. We have: Let p^ v and p^ v resp. denote the 
probability estimates learned from edge-level and node-level feedback using maximum likelihood. 
We have: 


K,v - Vu u v\ < max ^1 - pY - 


1 


(Put, V “1“ •AX Pmax ) 


Fu 


- 1 


1 


o 


,E 

Ui , v 




where <f> is the fraction of cascades in which edge (Ui , v) is dead, over those where v is active, and 


Prr 


and Pmin are the upper bound and lower bounds on the quantity — p,<, ,.). 


This result bounds the price we pay in terms of error, for adopting a the more realistic node-level 
feedback over edge-level feedback mechanism. 


4.2 Online optimization 

Unfortunately, the time complexity of the above approach is 0{\E\T 2 ), which doesn’t scale to 
networks with a large number of edges. To mitigate this, we adapt a result from online convex op¬ 
timization |f32l for learning the edge probabilities. Zinkevich et.al |[32l developed an online convex 
optimization framework for minimizing a sequence of convex functions over a convex set. In our 
case, we solve an online convex optimization problem for each node in the network. We first de¬ 
scribe some notation used in Zinkevich’s framework. Given a fixed convex set F, a series of convex 
functions c 8 : F -A- R stream in, with c s being revealed at time s. At each timestep s, the online 
algorithm must choose a point x s £ F before seeing the convex function c s . The objective is to min¬ 
imize the total cost across T timesteps, i.e., Cost on (T ) = X^ s g[t] c s{ x s ), for a given finite T. For 
the offline setting, the cost functions c s for .s £ [1, T] are revealed in advance, and we are required 
to choose a single point x £ F that minimizes Cost 0 f f(T) = X] s g[T] c s( x )- The /os^of the online 
algorithm compared to the offline algorithm is denoted as Loss[T ) = Cost on (T ) — Cost 0 ff(T). 
Note the above framework makes no distributional assumption on the streaming convex functions. 
Zinkevich et al. proposed a gradient descent update for choosing the estimates x s : 

x s+ i = x s - ri s V(c s (x s )) (7) 

where rj s is the step size to be used in round s and V(c s (x s )) is the gradient of the cost function 
revealed at round s. He proved that if we use Eq. (|7]i and set the step size rj s according to 1 /y/s, the 
average loss Los ^( T ) g 0es down as O ( ^). For our setting, we solve an online convex optimization 

1 We use the term loss instead of regret to avoid confusion with the notion of regret in the CMAB framework. 
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problem for each node in the network. For each node v, x s corresponds to our 9 S variables and 
c s corresponds to the negative log-likelihood function — log L s v (0). Note that we cannot ensure 
the cascades are i.i.d., making the usual SGD methods inapplicable. The time complexity of this 
online procedure is 0{\E\T). We have the following theorem which extends a similar result of lf32l . 
Let Obatch be the set of parameters learned offline with the cascades available in batch, and 9 s be 
the estimate for the parameters in round s of an IM campaign in the CMAB framework, d v be the 
in-degree of v and d max = - In (1 - p max ). 

Theorem 2. If we use Eq. j7| for updating 6 with p s decreasing as the following holds: 

- K0 S ) < + (v'r - *)r 2 . (8) 

s=1 

where T = max se [T] 11V (— L S (9 S ))\ \ is the maximum L2-nonn of the gradient of the negative like¬ 
lihood function over all rounds. 

The average loss Los ^ T '> ca n be seen to approach 0 as T increases. This shows that with sufficiently 
many rounds T, the parameters learned by the online MLE algorithm are nearly as good as those 
learned by the offline algorithm. Since there is a one to one mapping between 9 and p values, as T 
increases, the parameters p s tend to Pbatch which in turn approach the “true” parameters as the size 
of the batch, T, increases. 


4.3 Frequentist Approach 


In typical social networks, the influence probabilities are very small. To model this special case, we 
propose an alternative simple and scalable approach. Low influence probabilities cause the number 
of active parents i.e. \B C \ to be small. We propose a scheme whereby we choose one of the active 
neighbours of v, say Ui, uniformly at random, and assign the credit of activating v to u.;. The 
probability of assigning credit to any one of I\ active parents is -L. That is, edge ( Ui,v ) is given 
a reward of 1 whereas edges ( Uj , v) corresponding to other active parents Uj . j ^ i, are assigned 
a zero reward. We then follow the same update formula as described for the edge-level feedback 
model. Owing to the inherent uncertainty in node-level feedback, note that we may make mistakes 
in credit assignment: we may infer an edge to be live while it is dead in the true world or vice 
versa. We term the probability of such faulty inference, the failure probability p under node-level 
feedback. An important question is whether we can bound this probability. This is important since 
failures could ultimately affect the achievable regret and the error in the learned probabilities. The 
following result settles this question. 

Theorem 3. Let p m i n and p ma x be the minimum and maximum true influence probabilities in the 
network. Consider a particular cascade c and any active node v with K, active parents. The failure 
probability pfor under frequentist node-level feedback for node v is characterized by: 


P — (1 Pmin ) ( 1 


I<o 


J~][ [1 Pmax\ j T 


k—l .k=pi 



(9) 


Suppose p,f and are the inferred influence probabilities for the edge corresponding to arm i us¬ 
ing edge-level and node-level feedback respectively. Then the relative error in the learned influence 
probability is given by: 


Af 


A? 


„ =p(^E~ 2 ) ( 10 ) 

From Eq. [10J we observe that as K c increases, the c/ror in the mean estimates increases and it is 
better to use the maximum likelihood approach for credit distribution. In Section [6j we empirically 
find typical values of p max , p m .vn, and K c on real datasets and verify that the failure probability is 
indeed small. We also find that the proposed node-level feedback achieves competitive performance 
compared to edge-level feedback. 


5 Regret Minimization Algorithms 


As can be seen from Algorithm [T] the basic components in the framework are the EXPLORE, 
EXPLOIT and UPDATE subroutines. EXPLORE outputs a random subset S of size k as the seed set. 
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whereas EXPLOIT consults the oracle O and outputs the seed set that (approximately) maximizes 
the spread according to current mean estimates ji. UPDATE examines the latest cascade c and 
updates the parameters using the desired feedback mechanism M. Thus UPDATE may correspond 
to edge-level feedback Eq. (|3j) o r node-level feedback with frequentist update (4.31 or node-level 
feedback with MLE update (4.11. In the remainder of this section, we give four ways to instantiate 
algorithm A. They all invoke the subroutines the EXPLORE, EXPLOIT and UPDATE subroutines. 


Upper Confidence Bound: The Combinatorial Upper Confidence Bound (CUCB) algorithm was 
proposed in ED and theoretically shown to achieve logarithmic regret under edge-level feedback. 
The algorithm maintains an overestimate JU, of the mean estimates /i, : . More precisely, Jl, = //’, + 

\J 3 l 2 T^ ■ Exploitation using Jli values as input leads to implicit exploration and is able to achieve 
optimal regret mu. 

e-Greedy: Another algorithm proposed in ifTTI is the e-Greedy strategy. In each round s, this 
strategy performs exploration with probability e s and exploitation with probability 1 — e s . Chen et 
al. HD show that that if e s is annealed as 1/s, logarithmic regret can be achieved under edge-level 
feedback. 


The regret proofs for both these algorithms rely on the edge-level mean estimates. We obtain node¬ 
level feedback mean estimates in terms of edge-level estimates for both the frequentist (Theorem[6]) 
and MLE based (Theorem[l]l approaches. We can use these these estimates and adapt the proofs to 
characterize the regret. 

Thompson Sampling: Thompson Sampling requires a prior on the mean estimates. After observing 
the reward in each round, it updates the posterior of the distribution for each edge. For the subse¬ 
quent rounds, Thompson Sampling generates samples from the posterior of each edge and performs 
exploitation by using these samples as input to the oracle O. 

Pure Exploitation: This strategy performs exploitation in every round. Since we have no knowl¬ 
edge about the probabilities, it results some implicit exploration. 

6 Experiments 


Goals: Our goal is to evaluate the various algorithms with respect to the regret achieved and the 
error in the influence probabilities learned compared to the true probabilities. In addition, we also 
report the running times of key subroutines of the algorithms. 

Datasets: We use 4 real datasets - Flixster, NetHEPT, Epinions and Flickr, whose characteristics are 
summarized in Table [2] Of these, true probabilities are available for the Flixster dataset, as learned 
by the Topic aware IC (TIC) model |4). Since true probabilities are not available for the other 
datasets, we synthetically assign them according to the weighted cascade model ED: for an edge 
( u , v ), the influence probability is set to p U:V = It is worth noting that the weighted cascade 

model is commonly used to evaluate influence maximization algorithms whenever true probabilities 
and diffusion data are unavailable EDE21I9). 


Dataset 


\E\ 

Av. Degree 

Max.Degree 

NetHEPT 

15K 

31K 

4.12 

64 

Flixster 

29 K 

200K 

7 

186 

Epinions 

76K 

509K 

13.4 

3079 

Flickr 

105K 

2.3M 

43.742 

5425 


Table 2: Dataset characteristics 

Experimental Setup: The probability estimates of our algorithms are initialized to 0 or set accord¬ 
ing to some prior information (see Appendix B). A run of consists of playing the CMAB game for T 
rounds. We simulate the diffusion in the network by sampling a deterministic graph from the prob¬ 
abilistic graph G on each round: for the purpose of our experiments, we assume that the diffusion 
in the real world happened according to this sample. Given a seed set S, the real or “true” spread 
achieved in a round is the number of nodes reachable from S in the sample. We define the regret 
incurred in one round as the difference between the true spread of the seed set obtained using the 
bandit algorithm, and the true spread of the seed set obtained in given the true probabilities. This 
eliminates the randomness in the regret because of sampling. For our oracle, as well as for batch- 
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mode seed selection, we use the TIM algorithm Il30l . This is the state of the art algorithm for IM. 
We use a maximum of T = 1000 rounds and use k = 50 (a standard choice in the IM literature). We 
verified that we obtain similar results for other reasonable values of k. To eliminate any randomness 
in seed-set selection by the oracle O, all our results are obtained by averaging across 3 runs. 

Algorithms Compared: We consider CUCB, Thompson sampling (TS), e-greedy (EG), pure ex¬ 
ploitation (PE)j^For CUCB, if the update results in any //., exceeding p max , as in previous works we 
reset it back to p m ax HD- We set Pmax = 0.2 in our experiments, based on the fact that influence 
probabilities in practice tend to be small OH. For e-Greedy, we found that e 0 = 5 works well, and 
we set the exploration parameter on round s to e s = e () / s. To ensure a fair comparison of Thompson 
sampling with the other methods, we don’t use a prior on the probabilities. We do this by only 
sampling probability estimates for those edges which have been triggered at least once. Specifically, 
in each round s, pf - Beta(J2j =1 X,,T hs - Y,j=i ^i)- 

Baseline algorithms: We use random selection (RAND) and highest degree selection (HIGH- 
DEGREE) as baseline methods (5). 

Feedback Mechanisms: We consider edge-Level (EL), node-Level Frequentist (NLF), and node- 
Level maximum likelihood (NL-ML). For NL-ML, although we obtained reasonable performance 
using Zinkevich’s framework, we found it to be sensitive to the particular step-size selected. For 
all our experiments, we thus report results using the Adagrad regret minimization algorithm 03, 
which uses a per-variable step-size that roughly reduced by 1 /yjs. We set the initial step-size po 
to 0.85. We use the ‘A-M’ to refer to algorithm A with feedback mechanism M. For example, EG- 
EL means e-greedy with edge-level feedback. Looking at all combinations, we test a total of 12 
combinations of algorithms/feedback, plus two baselines which ignore feedback. We next present 
our experimental results. 

Running times: In order to characterize the running time for the various algorithms, we present 
the running time for their key components - EXPLOIT (P), EXPLORE and UPDATE (U), under all 
three feedback mechanisms. The time complexity UPDATE under all three feedback mechanisms is 
0(#(triggerededges)T). EXPLORE takes 0.003 seconds for selecting 50 random seeds for any 
dataset. As the #seeds k and the true influence probabilities increase, the number of triggered edges 
increases and UPDATE takes more time (Table[3j. 


Dataset 

EXPLOIT 

UPDATE 



EL 

NL-F 

NL-ML 

NetHEPT 

0.306 

0.041 

0.0104 

0.003 

Flixster 

1.021 

0.017 

0.167 

0.0396 

Epinions 

1.050 

0.051 

1.345 

0.0893 

Flickr 

1.876 

0.551 

0.7984 

0.037 


Table 3: Subroutine times (in sec/round) for k = 50 


Regret Minimization: We first evaluate the performance of the regret minimization algorithms on 
NetHEPT assuming edge-level feedback. We plot the average regret, Regret(T)/T, as the number 
of rounds increases. As can be seen from Figure l(a)[ the average regret for PE/EG/TS decreases 
quickly with the number of rounds. This implies that at the end of 1000 rounds, the probabilities 
are estimated by the bandits approaches well enough that they give a comparable spread to using 
the seeds selected in batch mode given the true probabilities. Pure exploitation achieves the best 
average regret at the end of 1000 rounds. This is not uncommon for cases where the rewards are 
noisy & Initially, with unknown probabilities, rewards are noisy in our case, so exploiting and 
greedily choosing the best superarm often leads to very good results. Random seed selection has the 
worst regret, which is constant. For the initial rounds, selecting seeds according to the high degree 
has a lower regret than other methods. With increasing number of rounds, the influence probabilities 
become more accurate and the IM oracle O outputs seed sets leading to higher spread than HIGH- 
DEGREE (this suggest we might reasonably consider a hybrid of these two approaches). We also 
observe that the regret for CUCB decreases very slowly. CUCB is biased towards exploring edges 
which have not been triggered often. Since typical networks contain numerous edges, CUCB ends 


2 Abbreviations used in plots are in parentheses. 
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up exploring much more than necessary and results in a slow rate of decrease in regret. We observe 
this behaviour for other datasets as well, so we omit CUCB from the further plots. We also omit 
RAND from further plots to keep them simple. 




Round 


(b) NetHEPT - NL 


Figure 1: Regret vs Number of rounds for NetHEPT, k = 50 


Average Regret vs Rounds 



Round 

(a) Flixster 


Average Regret vs Rounds 




Figure 2: Regret vs number of rounds for different algorithms 

To examine the effect of the feedback mechanism on regret, we plot the average regret under different 
feedback mechanisms in Figure [T(b)1 For NetHEPT, the regret decreases quickly under both node¬ 
level feedback mechanisms and is close to that obtained with edge-level feedback. For NetHEPT 
with k = 50, the average number of active parents for a node is 1.175. Previous work has shown 
that the probabilities learned from diffusion cascades are generally small l28l[T8l[26l . For example, 
if Pmin = 0 and Pmax varies from 0.01 to 0.2, the failure probability p (calculated according to 
the equation^ varies from 0.0115 to 0.2261. This is true for all our datasets. Thus, as the number 
of active parents increases, credit distribution becomes more difficult and credit distribution using 
maximum likelihood become more effective. For all our datasets, the regret using either node-level 
feedback is close to that obtained using edge-level feedback mechanism. For the other datasets, to 
reduce clutter we just plot regret for node-level feedback mechanisms (Figure [2}. For Flixster and 
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Epinions, both NLF and NL-ML are effective for all regret minimization algorithms with TS and PE 
obtaining the lowest regret. Interestingly, for Epinions HIGH-DEGREE is a competitive baseline 
and has low regret. For Flickr, because of the large size of the graph, it is challenging to find a 
good seed set with partially learned probabilities. As a result, the average regret after 1000 rounds 
is higher than for other datasets. We observe that while both TS and PE do find a locally optimal 
seed set. However, because of its exploration phase, EG is able to find a much better seed set and 
consequently converges to a much lower regret. To verify this, we plot the relative L2 error in the 
edge probabilities against the number of rounds. 


Relative L2 Error vs Rounds 



Figure 3: Flickr, k = 50: L2 error vs Rounds 


Quality of learning edge probabilities: As is evident from Figure [3] the mean estimates improve 
as the rounds progress and the relative L2 error goes down over time. This leads to better estimates 
of the expected spread and the quality of the chosen seeds improves. The true spread achieved thus 
increases and hence the average regret goes down. We see that for both PE and TS, the decrease in 
L2 error saturates relatively fast which implies that both of them narrow down on a seed set quickly. 
They subsequently stop learning about other edges in the network. In contrast, e-greedy does a fair 
bit of exploration and hence achieves a lower L2 error. 


7 Conclusion 

We studied the important, but under-researched problem of influence maximization when no in¬ 
fluence probabilities or diffusion cascades are available. We adopted a combinatorial multi-armed 
bandit paradigm and used algorithms from the bandits literature to minimize the loss in spread due 
to lack of knowledge of influence probabilities. We also evaluated their empirical performance on 
four real datasets. It is interesting to extend the framework to learn, not just influence probabilities, 
but the graph structure as well. 

A Proofs 


Theorem 4. Let p^_ v and p^ v resp. denote the probability estimates learned from edge-level 
and node-level feedback using maximum likelihood. We have: Let p^. v and Py v resp. denote the 
probability estimates learned from edge-level and node-level feedback using maximum likelihood. 
We have: 


I p N — p 

\rUi.V ir- 


'u it v\ ^ max 


1 ~Pu,.v - 


ijPuuv + 4>){PmaxV 


- 1 + - 


(Pi 


+ f') (Prr 


where <f> is the fraction of cascades in which edge ( Ui, v ) is dead, over those where v is active, and 
Pmax and Pmin are the upper bound and lower bounds on the quantity Pu v)- 


Proof. We want to estimate the error for probability of the edge (m, v) while using the maximum 
likelihood approach for credit distribution. Let Fg be the number of instances for which the event 
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£ failed. For example, F UitV is number of times edge («.,, v) is dead and F v is the number of times 
node v is inactive. Similarly, let Sg be the number of successful events and Tg be the total number 
of events. Clearly, Sg + Fg = Tg. 

Let v be a node under consideration at time t. Let B c = {v .\. ..UK '} be its set of active parents 
(which became active at timestamp t — 1 in the diffusion process) for cascade c. For our case, 
S v + F v = S Ui>v + F u . v = T UjV . Here, T u . „ is the number of rounds in which the edge (it*, v) is 
triggered. Let p®. ,, and p^_ v denote the learnt probability estimates under the edge level and node 

level feedback respectively. The update using edge level feedback implies p^. v = . 

If v isn’t active at f, it implies that activation attempts from all active parents failed and the corre¬ 
sponding edge (m, v) is dead. If v is activated, edge (m, v) may or may not be live. In the case of 
node-level feedback, we cannot observe its status. Let S* be the number of times node v is active 
because of a successful activation attempt through edge [u,, v) and let S* be the number of times 
node v became active because of an active parent other than Ui i.e. edge ( Ui,v ) is dead. We then 
have the following relations: 


s v = s s v + Sl 

(11) 

Q s — C 

(12) 

li,v = Sl + Fy 

(13) 


The gradient of ln(LL„) wrt p^. can be written as: 


<9(ln LLy) 
dpN 


F v 

E 




i-p. 


,N 


+ 


P c 


S v 

E i _ pc(\- p 


N 


(14) 


Here, P c = ' — Pn, v ) '- e - it is a product over the active parents of v in cascade c i.e. it 

is the probability that in cascade c, all active parents (other than it,) of v other failed. 

To obtain the probability estimates under node-level feedback, we set = 0 which implies: 


F v _ yk P c 

i - V N 2-^ i - p c ( i - p N ) 

rui,v l rui,v) 


(15) 


p c n~ p N ) 

Let the maximum and minimum values P c be P max and P m j n respectively. Then ) is 

bounded by P max and P rnln where P max = „) and similarly for P min • Hence ’ 


P. 


S v < P c (l P Ui ,v ) < S v 

~ ~ t^P~ PC (l-pZ, V ) ~ Ku> 


(16) 


From Eq. (151 and 16 


P < — < P 

min — q — max 

Uqj 


(17) 


From Eq. {FT}. [17] and [11 


F - 

X Un . V *“''11 


p < u ii v V < p 
r min — r~, , — r max 


s , 


Uj.v I 


p' . +1 

mm 1 


>Pu 


si 

sl 


> 


P‘ +1 

max 1 


(18) 


(19) 


sf 

Let q> = g 2 -. (j> depends on the structure of the network and the true probabilities. Note that 
E[0] = (1 — p* Ui . v ){ 1 — n^»(l — Puj,v)) where p* denotes true probabilities. 
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If Pu-;,v > Pu,.v let p",v - pt,v = £ 1- Plugging this into equation 


19 


we have 


£i — 1 Puj ,v 


1 




( 20 ) 


< Pu,.v let PZ,v - Put ,V = £ 2 ■ Plugging this into equation 
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we have 


£2 <P V 


-1 + 


1 


( Pui,v ”1“ 0)(-fmm) 


( 21 ) 


21 


we have e < 


If e is the error in estimation i.e. | p^. v — p^. v \ < e, then from Eq. (20i and 

max(p^ v - 1 + {< v 4 ){Pmin y 1 - Whkh WaS t0 belK ° Wn - 


Theorem 5. If we use Eq. (7) for updating 9 with r) s decreasing as the following holds: 

'Z(L' v (o batch ) - K{6 S ) < + (Vr- V. 


□ 


( 22 ) 


where T = max s ^[T] 11V (—(0 S )) 11 is the maximum L2-norm of the gradient of the negative like¬ 
lihood function over all rounds. 


Proof. The proof is an adaptation of the following loss result established in 

LossiT) < dia ^ F ^ + (VT- ^)||V(c TOax )|| 2 (23) 

where ||Vc mQ2; || is the maximum gradient obtained across the T rounds in the framework of (32). 
Turning to our setting, let the true influence probabilities lie in the range (0,p max ], for some p m ax- 
Then the 9 values for various edges lie in the range (0, 9 max ) where 9 max = — ln(l — p max )■ Our 
optimization variables are 9 and the cost function c s in our setting is —If,, 1 < s <T. Furthermore, 
in our case, dia(F) = \/df9 rnax since this is the maximum distance between any two “^-vectors” 
and Loss(T) = ]T] s _i (L s v (9batch) — L S V (9 S ). Substituting these values in Eq. 23 we obtain Eq.t22l 
proving the theorem. [j 

Theorem 6. Let p m i n and p m ax i> e the minimum and maximum true influence probabilities in the 
network. Consider a particular cascade c and any active node v with K c active parents. The failure 
probability pfor under frequentist node-level feedback for node v is characterized by: 


P f jr (1 Pmin) f 1 E [1 Pmax\ ) “f ( 1 p. j Pmax- 


K r 


K a 


k=l,k^i 


K r 


(24) 


Suppose pf and jif are the inferred influence probabilities for the edge corresponding to arm i us¬ 
ing edge-level and node-level feedback respectively. Then the relative error in the learned influence 
probability is given by: 

1 

= P ' 


Af - pf 


Pi 


(tb- 2) 

Pi 


(25) 


Proof. Consider any active node v with K c active parents. Consider updating the influence probabil¬ 
ity of the edge ( Ui, v). We may infer the edge ( Ui , v ) to be live or dead. Our credit assignment makes 
an error when the edge is live and inferred to be dead and vice versa. Recall that all probabilities are 
conditioned on the fact that the node v is active at time t and K c of its parents (u\, ..., Ui ,..., u k,,) 
became active at time t — 1. Let 8d (£z) be the event that the edge (ui, v) is dead (resp., live) in the 
true world. Hence we can characterize the failure probability as follows: 

p = Pr[(iij, v ) inferred live]Pr[£d | v is active at t] 

+ Pr[(rti, v) inferred dead]Pr[£; | v is active at t] 
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If ( Ui,v ) is live in the true world, then node v will be active at time t irrespective of the status of the 
edges ( Uj,v),j € [K c ],j ^ i. Hence, Pr [£i \ v is active at t] = Pr[£/]. 

By definition of independent cascade model, the statuses of edges are independent of each other. 
Hence, 

Pr [£d | v is active at t] = Pr [£d A 3 j ^ i s.t .(uj,v) is live] 
p = Pr[(iti, v) inferred live]Pr[£<i A 3 j ^ i s.t .(uj,v) is live] 

+ Pr[(uj, v) inferred dead] x Pr[£/] 

Let p u . tV be the true influence probability for the edge (uj,v), j £ [if c ]. Thus, Pr[£;] = p Ui , v 
Pr[£ d A 3 j ^ i s.t. (uj,v) is live] 

K c 

= (1 — Pr[£;]) [l — [] [1 -Pu it v]] 


Since one of the active nodes is chosen at random and assigned credit, Pr[choosing rq for credit] = 
Pr[(iij, v) inferred live] = We thus obtain: 


P = 


1 Kc ^ 

— Pui,v)[ 1 — \ [l _ Ptlfcjl)]] 3" (1 _ )Pui , 


(26) 


k=l,k^i 


Let p m i n ( Pmax ) denote the minimum (resp. maximum) true influence probability of any edge in 
the network. Plugging these into Eq. p6| ) gives us the upper bound in Eq. ( [24] ), the first part of 
the theorem. Let //] v and /if denote the mean estimates using node-level and edge-level feedback 
respectively. That is, they are the influence probabilities of edge ( Ui,v) learned under node and 
edge-level feedback. We next quantify the error in /if relative to /if. Let Xf, be the status of the 
edge corresponding to arm i inferred using our credit assignment scheme, at round s. Recall that 
under both edge-level and node-level feedback, the mean is estimated using the frequentist approach. 

That is, /if = ff s=1 (similarly for edge-level feedback). Note that Xi^ s denotes the true reward 

(for edge level feedback) whereas X/f denotes the inferred reward under node-level feedback, using 
the credit assignment scheme described earlier. Thus, for each successful true activation of arm i 
(i.e., Xj jS = 1) we obtain Xf. = 1 with probability 1 — p and for each unsuccessful true activation, 
we obtain X.f = 1 with probability p. Let S, denote the number of rounds in which the true reward 
Xi S = 1. Hence, we have: 


Pi 


,7V 


Si 

Ti 

Sj(l ^ p) + {Tj — Sj)(p) 


(27) 

(28) 


The second part of the theorem, Eq.(|25|), follows from Eq.(f27|i and (|28]l using simple algebra. □ 


B Effect of prior 


For typical social networks, we may have an idea on the range of influence probabilities. E.g., we 
may know that the influence probabilities lie in the range of [0.0005, 0.2] for a given network. If 
available, we can use this domain specific information to better initialize the influence probability 
estimates. For the maximum likelihood approach, these initial estimates can prove to be impor¬ 
tant for faster convergence of the gradient descent method. For the frequentist (both edge-level and 
node-level approaches), where the updates are binary i.e. the X i t follow a Bernoulli distribution 
the initialization can be treated as a Beta prior characterized by the parameters a and f3 the mean of 
which can be given by: The Bernoulli and Beta distributions are conjugate priors and the pos¬ 

terior follows a Beta distribution. The mean of the posterior which results in a modified update rule 
given by: /ij = ■ Hence the Beta prior parameters act like pseudo counts in the update 
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Figure 4: Effect of prior for Flickr, k = 50 


Relative L2 Error vs Rounds 



(a) L2 Error 



(b) Fraction of edges within 10% Rel Err 


Figure 5: Network Exploration for Flixster, k = 50 
formula. For the maximum likelihood method, we initialize the p estimates randomly between 0 and 
-pppp- We use a prior with a = 1 and /3 = 19 (similar to J23j) and show its effect (Figure [ b|) on the 
Flickr dataset for the best performing e-greedy algorithm. In this figure, ELP shows the regret for 
edge-level feedback with the prior. Similarly for the other feedback mechanisms. 


C Network Exploration 


Instead of minimizing the regret and doing well on the IM task, one might be interested in exploring 
the network and obtaining good estimates of the network probabilities. We refer to this alternative 
task as network exploration. The objective of network exploration is to obtain good estimates of the 
network’s influence probabilities, regardless of the loss in spread in each round and it thus requires 
pure exploration of the arms. Thus, we seek to minimize the error in the learned (i.e., estimated) 
influence probabilities ft w.r.t. the true influence probabilities p i.e. minimize ||/t — /7|| 2 . We study 
two exploration strategies - random exploration, which chooses a random superarm at each round 
and strategic exploration, which chooses the superarm which leads to the triggering of a maximum 
number of edges which haven’t been sampled sufficiently often. 

Strategic Exploration: Random exploration doesn’t use information from previous rounds to to 
select seeds and explore the network. On the other hand, a pure exploitation strategy selects a seed 
set according to the estimated probabilities in every round. This leads to selection of a seed set which 
results in a high spread and consequently triggers a large set of edges. However, after some rounds, 
it stabilizes choosing the same/similar seed set in each round. Thus a large part of the network may 
remain unexplored. We combine ideas from these two extreme, and propose a strategic exploration 
algorithm: in each round s, select a seed set which will trigger the maximum number of edges that 
have not been explored sufficiently often until this round. We instantiate this intuition below. 
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Recall Ti is the number of times arm i (edge (u n , v)) has been triggered, equivalently, number of 
times Uj was active in the T cascades. Writing this in explicit notation, let T? v . be the number of 
times the edge ( u , v) has been triggered in the cascades 1 through s, s £ [T], Define value(u) := 
YhveN out (u) T' ' | ■ Higher the value of a node, the more unexplored (or less frequently explored) 

out-edges it has. Define value-spread of a set S C V exactly as the expected spread cr(S) but instead 
of counting activated nodes, we add up their values. Then, we can choose seeds with the maximum 
marginal value-spread gain w.r.t. previously chosen seeds. It is intuitively clear that this strategy 
will choose seeds which will result in a large number of unexplored (or less often explored) edges 
to be explored in the next round. We call this strategic exploration (SE). It should be noted that 
the value of each node is dynamically updated by SE across rounds so it effectively should result in 
maximizing the amount of exploration across the network. 

We show results on the Flixster dataset. Figure |5(a)| shows the L2 error obtained by using Random 
Exploration and Strategic Exploration strategies, coupled with Edge level feedback and the frequen- 
tist node-level feedback mechanisms. First, we can see that strategic exploration is better than just 
choosing nodes at random because it incorporates feedback from the previous rounds and explicitly 
tries to avoid those edges which have been sampled (often). As expected, edge level feedback shows 
the faster decrease in error. In Figure |5(b)| we plot the fraction of edges which are within a relative 
error of 10% of their true probabilities. Since we have the flexibility to generate cascades to learn 
about the hitherto unexplored parts of the network, our network exploration algorithms can lead to 
a far lesser sample complexity as compared to algorithms which try to learn the probabilities from 
a given set of cascades. This is similar to the benefits obtained using active learning as compared to 
supervised learning. 
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